DOCUMENT RESUME 



ED 353 26A 



TM 018 905 



AUTHOR 
TITLE 
PUB DATE 
NOTE 



PUB TYPE 



EDRS PRICE 
DESCRIPTORS 



Hodges 5 Carol A. 

Teacher Judgtaents and Standardized Assessments. 
Apr 92 

20p.; Paper presented at the Annual Meeting of the 
American Educational Research Association (San 
Francisco, CA, April 20-24, 1992), 
Reports - Research/Technical (143) — 
Speeches/Conference Papers (150) — Tests/Evaluation 
Instruments (160) 

MFOl/PCOl Plus Postage. 

^'^Academic Achievement; Comparative Analysis; 
Elementary School Students; ^^Elementary School 
Teachers; Evaluators; Grade 1; Grade 2; ^^Kindergarten 
Children; Literacy; Longitudinal Studies; Primary 
Education; Scores; ^Standardized Tests; ^Student 
Evaluation; ''^Teacher Role; Test Results; Test Use 

ABSTRACT 

This study, part of a larger longitudinal research 
project, focused on the relationship between teachers' rankings of 
their students' literacy achievements, based on informal assessment, 
and the scores that the students received on a standardized test. 
Kindergarten students (initial sample of 136) were followed for 3 
years. At the end of kindergarten, grade 1, and grade 2, teachers 
evaluated the students' progress according to how well they mastered 
a set of criteria that represented a successful reader and writer. 
Teachers administered a standardized test to students each year. Data 
from all 3 years illustrated significant positive relationships 
between teachers' evaluations and test scores. The fact that the 
teachers and the tests appear to be measuring several similar factors 
should ease fears that teacher judgments might be at odds with that 
which is currently considered more reliable, test scores. However, 
data do suggest that teacher judgments may be an even more valid 
measure than standardized test scores. One table presents study data, 
2 figures present the assessment forms for reading and writing, and 
13 references are included. (Author/SLD) 



Reproductions supplied by EDRS are the best tho.t cai be made * 
^ from the original document. * 



ERLC 



U.S. DCMRTtWfNTOf CDUCATtON 
Offica ol Educaltonti R«March and lmpfOv«mant 
EDUCATIONAL HESOUHCES (NFOflMATION 
CENTEH(EfllC) 

□ This document has b««n /aproduced aa 
received from th« p«r»On or ofO*"'^*''^" 
Ofiginadng iL 

□ Minor Changes have b«an mad« to impfova 
reproduction quality 

• Potntt of view or opinions stated m this docu- 
ment do not necessenly represent oftictBl 
OEHI poeifton or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



TEACHER JUDGMENTS AND STANDARDIZED ASSESSMENTS 

paper presented at the 
1992 AERA ANNUAL MEETING 

by 

Dr. Carol A. Hodges 

Associate Professor 
Buffalo State College 

1992 AERA ANNUAL MEETING 

TEACHER JUDGMENTS AND STANDARDIZED ASSESSMENTS 



BEST COPY flVAIlABlE 



2 



ABSTRACT 

The belief that data from standardized tests are more trustworthy than data 
collected by other means is often held by school administrators despite the fact 
that reading researchers have found that reading assessment has not kept pace 
with advances in research, theory, and practice and that early childhood experts 
agree that extensive standardized achievement testing narrows and misdirects 
the curriculum and drains instructional time. 

This study, part of a larger longitudinal research project, focused on the 
relationship between teachers' rankings of their students Uteracy achievements, 
based on informal assessment and the scores that the students received on a 
standai'dized test. Kindergarten students were followed for three years. At the 
end of their kindergarten, first grade, and second grade school years, teachers 
evaluated the students' progress according to how well they mastered a set of 
criteria which represented a successful reader and writer at the end of each 
grade Then the teachers administered a standardized test to the students. 

Data from all three years illustrated there were significant positive relationships 
between the teachers' evaluations and the test scores. Correlations rangM from 
.57-.90 (p<.0 1) The fact that the teachers and the test appear to be measuring a 
number of similar factors should ease fears that teacher judgments might be 
totally at odds with what currently are considered to be the more reliable, the 
test scores. 

At the same time, knowing what we do about the negative factors associated with 
standardized testing in the primary grades and the fact that little use seems to be 
made of the test results, the data suggest that teacher judgments, based on 
knowledge of their students' development and knowledge of the processes 
involved in reading and writing, may be even more valid means of obtaining 
assessment information. 



ERIC 



4 



3 



INTRODUCTION 

The use of standardized tests has increased dramatically over the past few 
decades and the trend toward more testing seems likely to continue. However, as 
the emphasis on standardized tests has escalated, so have objections to them. A 
number of reading researchers (Edelsky & Harmon, 1965; Garcia & Pearson, 199 1; 
Hodges, 1969, 1991; Squires, 1967;Teale, 1966, Valencia & Pearson, 1966) have 
pointed out that early reading assessment has not kept pace with advances in 
reading research, theory, and practice. At the same time early childhood experts 
(Bredekamp, 1966; Fairtest & NYPIRG, 1990; International Reading Association, 
1966; Harmon, 1990; Moyer, Rgertson, & Isenberg, 1967; National Association for 
the Education of Young Children, 1966) argue that children are being tested too 
early. They claim that young children are not good test takers; that the 
unfamiliar format leads to stress; that test results are influenced by the children's 
ability to sit still and be quiet; and that eaetensive testing narrows and misdirects 
the curriculum and drains instructional time without a clear demonstration that 
the investment is beneficial. In addition, groups as diverse as the American 
Association of Colleges for Teacher Education (AACTE), the American Federation 
of Teachers (AFT), the National Association of Elementary School Principals 
(NAESP), and the national PTA have spoken out to urge states to abandon the use 
of multiple-choice tests and to replace them with alternative assessment 
techniques which seek to measure directly the student's ability to perform, in the 
subject area. Several years ago when I eaamined a variety of standardized 
readiness and early reading tests, I found that such tests: 



ERIC 



4 



4 



• have stayed ttie same for 7irtttaU7 sixty years. They 
reflect behavioral research that assumes that literacy can only 
be taught through the direct instruction of isolated skills which 
are bierarchically organized and mastered one level at a 
time. 

• are iiiappropriat© for literacy assessment based on a holistic 
philosophy because they measure only reading skills and 
ignore other components of literacy such as speaking, 
listening, and writing, components which we now realize are 
highly correlated with success in reading. And even in the area 
of reading these tests measure omy a narrow range of the 
knowledge and skills involved. For instance the readiness and 
early reading tests I surveyed focused on phonic skills and 
vocabulary recognition, ignoring other important aspects such as 
background knowledge, directionality, print awareness, other 
word recognition strategies, comprehension strategies, 
appreciation of books, and interest in reading and writing. 

• are divorced from the reality of authentic reading and 
writing situations. They depend on multiple choice 
questions and the materials read consist of single letters, 
words, and sentences, or very brief passages. Furthermore, 
pupils background knowledge and interests are ignored; the tests 
are timed; and pupils are not allowed to consult with one another. 
Obviously, such materials and settings are as far from authentic 
reading and writing situations as possible I 

Unfortunately, despite their apparent potential, the acceptance of alternative 
assessment techniques has been constrained by concerns about their lack of 
efficiency and objectivity. Because it is easy to get a false sense of security when 
skilled reading is equated with scores on reading tests, many school personnel 
believe that data from standardized tests are more trustworthy than data 
collected by other means. 



ERIC 



THE PROBLEM 



After completing the survey of early reading tests, I began investigating the 
primary level (K-2 grade) literacy program and assessment tools of a school 
district in a small suburban community. This district tested all of its students 
beginning in kindergarten each May with a widely used standardized test 
battery. After interviewing the kindergarten teachers I discovered that they 
administered standardized reading achievement tests to students very 
reluctantly. They resented the time that the administration of the test took from 
insti-uction, the pressures that it put on the curriculum, and the frustration tiiat it 
exerted on their students. In addition, because these teachers had made a 
tiransition from a basal readiness program to a more developmental! / based 
process oriented literacy program, they felt the need to have a variety of 
assessment tools for the everyday instructional decision-making that is a crucial 
part of tiiat approach. But they were not sure how to use informal assessment 
and, even they, wondered whether the informal tools could provide valid and 
reliable data. 

The questions most often asked by the administiration and Uie teachers were,, 
"How would the teachers' assessment of students baseci on alternative evaluation 
techniques compare to tiie way in which tiie standardized test assessed tiiem? 
How valid would teacher judgments be? The results of tiiis stiidy begin to provide 
information to answer those questions. 



THE STUDY 



Teacher Ratings: T^ar Om 

In May of 1969, before they administered the standardized achievement test, I 
asked seven kindergarten teachers to evaluate their 136 students according to 
'now well they had mastered a set of criteria which the teachers felt represented 
the successful reader and writer at the end of kindergarten. Among the criteria 
reported by the teachers were the following: 

•attitude toward books and reading/writing 

•recognition of the letters of the alphabet 

•knowledge of grapheme/phoneme correspondences 

•use of invented spelling in writing 

•ability to listen to and comprehend stories 

•ability to read Independently 

•general maturity (following directions and keeping to a task) 

The teachers assessed their students as being above average readers/writers (3), 
average readers/writers (2), and below average readers/writers (1) based on 
these criteria. They used a variety of assessment techniques including anecdotal 
records, observation checklists, and work samples. See Figures One and Two for 
examples of checklists used by the kindergarten teachers. 

(Insert Figures One and Two about here) 

The standardized test which the teachers later administered to their 
kindergartners purported to assess skills in auditory discrimination, 
grapheme/phoneme correspondence, decoding, and listening comprehension. The 
test scores esamined were the Total Reading stanine scores (9- 1) which the 
students earned. 

A comparison of the teacher assessments with the Total Reading stanines 
reported on the standardized test showed that there was a significant relationship 
between the assessments of the students by the teachers and the Total Reading 
stanines obtained by the students on the standardized test The degree of the 
relationship between the teacher ranked groups and the test scores was 



computed by using the Poarson Product Moment correlation coefficient Table 
one illuslsrates that the correlations for the classes ranged from .5a-.67 (p< .0 1 ) 
A correlation of .7 1 (p< .C 1 ) was found over all classes. 

(Insert Table One about here) 



Teacher Ratings: Year Two 

Near the end of these same students' first grade experience (May 1990), I asked 
their six first grade teachers to evaluate them according to how well they had 
mastered a set of criteria which the teachers felt represented the successful 
reader and writer at the end of first grade. Among the criteria reported by the 
teachers were the following: 

•ability to handle books 

•knowledge of how print works 

•attitude toward books and reading/writing 

•knowledge of grapheme/phoneme correspondences 

•use of invented spelling in writing 

•ability to listen to and comprehend stories 

•ability to read independently 

The teachers assessed their students as being above average readers/writers (3), 
average readers/writers (2), and below average readers/writers ( l) based on 
these criteria. Over this second year the original population of 136 students was 
reduced to 1 17 because nineteen students who had been in the original 
kindergarten cohort were lost. They used a variety of informal assessment 
techniques. 



8 



The standardized test which the teachers later administered to their first graders 
was composed of subtests which assessed grapheme/phoneme correspondence, 
decoding, vocabulary, listening comprehension, and reading comprehension skills. 
Once more, the test scores examined were the Total Reading stanine scores (9-1) 
which the students earned. 

A comparison of the teacher assessments of their students and the Total Reading 
stanines reported on the standardized test illustrated a significant relationship. 
(See Table One.) Correlations for the six classes ranged from .70-.a4 (p< .0 1 ) A 
correlation of .67 (p< .01) was found over all teachers. 



Near the end of the third year ( 1990-9 1), the students' six second grade teachers 
were also asked to assess their students reading/writing ability in the manner 
previously done by the kindergarten and first grade teachers. The set of criteria 
which the teachers felt represented the successful reader and writer at the end of 
second grade were the following: 

•ability to handle books 

•knowledge of how print works 

•attitude toward books and reading/writing 

•knowledge of grapheme/phoneme correspondences 

•use of invented spelling in writing 

•ability to hsten to and comprehend stories 

•ability to comprehend print in a variety of situations 

•ability to read independently 

•ability to follow directions 

The subtests represented on the standardized test that year were letters and 
sounds, vocabulary, and comprehension. Seven more students from the original 
kindergarten cohort had left the school; leaving a total of 1 1 0 students whose 
progress was followed over the three years. They too used a variety of 
assessment techniques. 



9 



A comparison of the second grade teadier assessments with the Total Reading 
stanines reported on the standardized test again illustrated that there was a 
significant relationship between the assessments of the students by the teachers 
and the Total Reading stanines obtained by the students on the standardized test. 
(See Table One.) The degree of the relationship between the teacher assessed 
groups ranged from .63-.90 (p< .0 1) 

PatUfttg 9t Ditfwwwg: When the teacher assessments and the test scores are 
reviewed for individual teachers one interesting pattern becomes apparent 
Because teachers were asked to evaluate their students as above average, 
average, or below average, they were in a sense encouraged to categorize some 
pupils in each class, as below average. The standardized test was not forced to do 
so. Therefore in some classes no children received a below average stanine score 
( 1 -5), but did receive a below average assessment by the teacher. Any 
replication of this study should word directions b? teachers carefully so that they 
do not feel forced to place students in a be-iow average category. 

CONCLUSIONS AND lUPLICATIONS 

What evidence would pi'ove that teacher judgments can be valid measures of 
reading/writing achievement? If we were to develop a new test of 
reading/writing achievement, we would have to fmd a valid criterion measure of 
reading/writing to establish the new test's concurrent validity. Because we know 
that there are no perfect measures of reading/ writing achievement, we would 
probably use other achievement tests tJiat are presumed to be valid. Then if our 
new test elicited test scores correlating significantly with the other tests we 
would conclude that our new test was a valid measure of achievement Can we 
use the correlations found between the teacher assessments and the test scores to 
establish concurrent validity? 

The question may really be, "Do w« want to"? First, can we presume the test used 
by the school district in this study to be a vaUd one? The technical manual of the 
test used states that the test is ^jp^t^ to correlate significantly with other 
achievement measures but offers no specific data to support the claim. And how 



ERIC 



iO 



do we know that the other tests are valid measures? As has already t>«en stated, 
most reading assessment has not kept pace ^nth advances in reading research, 
theory, and practice. Even if this particular 'cest correlated highly with other 
similar tests, would it necessarily be a valid test of reading/ writing as they are 
conceived of in this school district? 

Perhaps a better question might be, "What kinds of Information are used in a 
school district when decisions are being made?" First and second grade teachers 
who were interviewed about their use of previous end-of-the- year test results 
and assessments made by their students' previous teachers, unanimously chose 
the previous teachers' a.ssessments over the test results. Even the principal of the 
school reported that she rarely used the test results for school-wide instructional 
decisions. She, too, preferred the more encompassing information obtained from 
the teachers. 

Obviously the results of this study are limited because the population consisted of 
only one school district. However, having found such consistency of high 
correlations over the kindergarten, first, and second grade teachers, I believe that 
the teacher and the test measures can be said to be measuring a number of similar 
factors. The coefficient of determination for the entire set of classes each year 
ranged from .45-.50 leading me to beheve that the teachers and the test were 
tapping about fifty percent of the same factors. These relatively high correlations 
of teacher judgment with standardized tests should ease fears that teacher 
judgments would be totally at odds with the standardized test results. At the same 
time, knowing what we do about the negative factors associated with standardized 
tests and testing in tiie primary grades and the fact that iittie use seems to be 
made of the test results, the data, suggest that teacher judgments, based on 
knowledge of their students' development and knowledge of the processes 
involved in reading and writing, may be even more valid means of obtaining 
information for insti'uctional decisions. 1 urge others to replicate this study. If 
pupil assessments by teachers in other school distiricta also correlate highly with 
tftst scores, then tiie notion of subjectivity in the alternative forms may not be a 
negative factor as some consider it now. 



ERIC 



II 



n 

REFBKEKCBS 

Bredekamp, S. (Ed.). (1956). DfiV4>Jopm^nmy appfoprM^ prsctJc^ Washington, 
D.C.: National Association for tlie Education of Young Children. 

Edelsky, C. & Harman,S. (1966). One more critique of reading tests-with two 
differences, SasgJJsU £ducatJao 20, 157-71. 

FairTest & NYPIRC. ( 1 990). S)tM2^du!^ t^sts mj^ our cMOmiO. A giiAi& 

t^fstJngrmmJnN^wrorJi, New York: The National Center for Fair and 
Open Testing. 

Garcia, G. & Pearson, P. D. ( 199 1). The role of afssessment in a diverse society. In E. 
Hiebert (Ed.), IJt^iirscyA^saiv^rs&SiXiUy. Pfirsp^'^tJv^, practJ<':»s„ and 
poUdiis. (pp. 253-2 76), New York: Teachers College Press. 

Harmon, S. (1990). Negative effects of Achj.evement testing in literacy 

development. In C. Kamii (Ed.), Ad'/j'&vmaot. testing Jn t^fi ^lygradi>»s. (pp. 
111-116), Washington, D.C.: National Association for the Education of Young 
Children. 

Hodges, C. (1969, March). R'»adm>m t^tsAn obsoim spproadi to m<&rg4fnt 
literacy stssessmmt Paper presented at the annual meeting of the 
American Educational Research Association, San Francisco, CA. 

Hodges, C. ( 199 1). Instruction and assessment of emergent literacy. In L. Weis, P. 
Altbach, G. Kelly, & H. Petrie (Eds.), CritJcai P^rspsctJv^ on early childhood 
ifidiicft/on . (pp 153-166), New York: State University of New York Press. 

International Reading Association, Early Childhood and Literacy 

Development committee. (1966). Uteracy development and pre-first grade. 
(Mdhi'kfdBdm^tJon 63 , 1 10- 1 1. 

Moyer, J. , Egertson, H. & Isenberg, J. (1967). Association for childhood 
education, international position paper: The child-centered 
kindergarten, CMdhood £ducatJon , 63, 235-42. 

National Association for the Education of Young Children. ( 1966). NAEYC 

position statement on standardized testing of young children 3 through 6 
years of age. rotmgCMdr&n^l, 42-47. 



12 



Squires, J. (1987, April). Introduction: A special issue on the state of assessment in 
reading. ITi^I^'Mdtng readier, 40, ll^-ll"^. 

Teale, W. (1969). Developing appropriate assessment of reading and writing 
in the early childhood classroom. TJieEmejitsrySdmJ purnsl 69, 
173-163. 

Valencia, S. & Pearson P. D. (1966). New models for reading assessment. Eea^Ung 
Sduc^ti<m Report 71 Urbana, IL: University of Illinois, Center for the Study of 
Reading. 



o 

CO 
CO 



0) 



> 



en 
c 
o 

o 
o 



<1> 

o ^ 

in 



:$i :$i * * 

O CO O O h") 
CO UD CT^ vO \0 











in 






(N 




04 


oa 


OJ 


£ 


£ 


E 


e 


e 


e 


O 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 












\- 


cn 


cn 


(n 


tn 


(n 


tn 


(n 


(n 


cn 


CO 


<n 


cn 


«s 


«s 


a 




«s 


m 


O 


O 


O 




O 


o 



\0 



tn 



CM 



£ 

O 

o 

o 

o 
a> 
o 

i 



X 

to 
I- 
z 

H 

CO 
CO 

UJ 

CO 
CO 

< 

X 

<-> 
< 

UJ 

I— 

CO 



o 
I- 



>- O 

O 



4> 

CD o 



CO 



%%%%%% 

(o in ^ to o T 
CO CD 



CM K> io \o 



£ 

o 

O 

cn 



£ 
o 
o 
c 



esse 

o o o o 
o o o o 



(A <n «i 0} firi 



to cn r> tn <n <n 

J5 J! -2 -2 -2? J5 

o o o o o o 



NO 



cn 
a> 
-o 

«- 
CD 



cn 

«> 



******* * 

ojvoo^r-tncsioo ^ 
oo 00 flo to r^. 



c 

1 ^ 



— 04 JO ^ iO \0 f*- 

£ 
o 
o 
e 



£ 
o 
o 



E 
o 
o 



EES 

o o o 

o o o 

_ _ C t- 1. 

cn cn cn cn cn «rt tn 

cn cn CO cn cn cn tn 

CO «3 « w _g <g _g 

O O O O O O O 



cn 



FIGUKE OHE 

Informal Assessment: Em»fg»ntand Early Reading Stfatogi»s 



NAMT?- 


+ - knows 

* - learning DATES: 












Identifies front of haoV 












Knows where to start riding 












Aware of page turning direrf-ion 












Aware of top-bottom readitig 












Aware of left-right. 












Aware of return sweep 
























Knows punctuation 

perlAd 












question mart 










-III 


exclanttatlofl mart 










1 












I 














Can identify a letter 












Can identify a word 












Knows print contains messagA 












Finger pointing 

no attempt 












alldefiaerftftft 












word by Wfvrd 
























Knows Book Terms 
cover 
























title page 












author 












illustrator 












page's 












other 

























Notes: 



Informal Assessment -Reading Cont'd 



NAME:. 



* ■ i^amina 

DATES: 










r— ^ 


Story Retellina/Rstdlnc 
own vftffilftti 












retells all imoortant pfiinte 












ffttftlls parts 












retells almofitnAtiA 












Dartlallv m«mAri!!^ 
























Dartiallv reading pritii- 












readfi all pHnt. 

























Knows g/p corrwpondMic^ (circle) 
bcdf ghjk 
Imnpqr St 

V wxy z 

aeiou 



Sight words: 



FIGURE TWO 

Informal Ass«ssm«nt: Writing Strategies 

NAME_ 

+• knows 

*■ learning DATES: 



Drawing 

simple 

detailed 

dictates story. 



Scribble writing 
uncontrolled— 

coiitrolled^ 

left /right 
top/bottom_ 
random letters. 

Copies words _ 



Invented spelling 
initial consonants. 

initial & final 

vowtls-incorrecL 

vowels-correct 

some' standard 

most standard 



Spacing 

strings of letters 

space between letters. 

strings of words 

space between words. 



Composition 

labels Awrds 

phrases/sentences. 



Mormal Assessment: Writing Strategies cont.d. 



NAME: . 

+"knows 

♦-learning DATES: 



Storyline 

new theme each page 

theme continuity 

lit. influence/pattern bociL 

fiction 

non-fiction 

Punctuation 

periods rarely— 

end of sentence 

overu8e__ 

question mark rarely 

end of sentence 

overuse 

exclamation mark rarely— 

end of sentence 

overuse 

other___ 



Handwriting 

all caps 

caps and lovnr case. 



Work Style 

works alone- 

asks peers for help 

helps others. 

shares and discusses ideas. 



N0T1?S: 



Informal Assessment: Writing Strategies contd. 

NAME: 

+»knows 

♦■learning DATES: 



Interest 
journal writing 

writes on own 

writes wlien asked. 

must be coaxed 

center writing 

chooses to write 

prefers to dictate- 
writes books 



Sharing Time 

chooses not to share 

describes picture-simple— 

complex 

reads words 

points to words when reading. 

uses expression when reading. 

adds info when asked 

listens to others 

asks questions of others 

responds with write/book talk- 
confused telling and/or asking. 



